resolves #272 #275

thanasions · 2019-05-20T09:58:07Z

No description provided.

IanGrimstead

👍

* argschecker updated #178 * Reverted to latest pdmarima (#212) * Removed erroneous 2nd arima fit (#212) arima fits on construction, don't need to explicitly call 'fit' * Removed erroneous 2nd arima fit (#212) arima fits on construction, don't need to explicitly call 'fit' * Reverted to original test specification (#212) * Reverted to original test specification (#212) * Added version debug code (#212) As requested on pmdarima bug reporting page * Added version debug code (#212) As requested on pmdarima bug reporting page * Report warnings and errors (#212) May have accidentely been surpressing errors - that could be reporting why the test fails * Report warnings and errors (#212) May have accidentely been surpressing errors - that could be reporting why the test fails * resolves #217 * resolves #217 * test * test * test * test * test * test * changed the tests with real data to check if random numbers were comfusing the models, hence the big discrepancies * changed the tests with real data to check if random numbers were comfusing the models, hence the big discrepancies * Updated Arima to use pmdarima rather than pyramid-arima (#212) * Updated Arima to use pmdarima rather than pyramid-arima (#212) * Test pmdarima 1.0.0 to test windows (#212) Seeing if an earlier version of pmdarima works in windows * Test pmdarima 1.0.0 to test windows (#212) Seeing if an earlier version of pmdarima works in windows * emtech report to file! * emtech report to file! * Reverted to latest pdmarima (#212) * Reverted to latest pdmarima (#212) * Removed erroneous 2nd arima fit (#212) arima fits on construction, don't need to explicitly call 'fit' * Removed erroneous 2nd arima fit (#212) arima fits on construction, don't need to explicitly call 'fit' * Reverted to original test specification (#212) * Reverted to original test specification (#212) * Added version debug code (#212) As requested on pmdarima bug reporting page * Added version debug code (#212) As requested on pmdarima bug reporting page * Report warnings and errors (#212) May have accidentely been surpressing errors - that could be reporting why the test fails * Report warnings and errors (#212) May have accidentely been surpressing errors - that could be reporting why the test fails * analyzer ngrams processing was not stopping unigrams :) * analyzer ngrams processing was not stopping unigrams :) * adjusted tests to reflect bug fixes in stoplists processing * adjusted tests to reflect bug fixes in stoplists processing * added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf * added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf * pmdarima>=110 * pmdarima>=110 * added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf * added a check on the returned tuple for stopwords. That will enable users to optimize list without having to re-compute tf-idf * rid of vectorizer. Only vocabulary needed * rid of vectorizer. Only vocabulary needed * 225 ridof pmdarima (#226) * rid of vectorizer. Only vocabulary needed * rid of pmd. Also realized that two of our test series were identical. No need to test them twice :) * pmd left. * just to check why one excepts and other doesn't * rid of vectorizer. Only vocabulary needed * scipy was the proble, in the end. Has to be >=1.2.1 * 225 ridof pmdarima (#226) * rid of vectorizer. Only vocabulary needed * rid of pmd. Also realized that two of our test series were identical. No need to test them twice :) * pmd left. * just to check why one excepts and other doesn't * rid of vectorizer. Only vocabulary needed * scipy was the proble, in the end. Has to be >=1.2.1 * 223 pipeline bug (#224) * rid of vectorizer. Only vocabulary needed * pickle-depickle tfidf test now represents different executions (#223) WordAnalyser reset between calls to main() - will catch if stopwords etc not populated * 223 pipeline bug (#224) * rid of vectorizer. Only vocabulary needed * pickle-depickle tfidf test now represents different executions (#223) WordAnalyser reset between calls to main() - will catch if stopwords etc not populated * Travis now reports python packages in use Added `pip freeze` to travis.yml * Travis now reports python packages in use Added `pip freeze` to travis.yml * Corrected pip listing of packages * Corrected pip listing of packages * 228 data path (#229) * Removed override to 'data' path and added date info #228 Now reports date range of patents in use * Removed 2nd construction of WordAnalyser #228 * 228 data path (#229) * Removed override to 'data' path and added date info #228 Now reports date range of patents in use * Removed 2nd construction of WordAnalyser #228 * 230 arima failing (#231) * Alternative method to annoy ARIMA #230 * 230 arima failing (#231) * Alternative method to annoy ARIMA #230 * 227 bug csv date (#233) * Testing python 3.7.3 via pip and *correctly* switch to Xenial linux (#227) * Checks if DF date column is a string and converts to datetime #227 * Oops. Test failing as date_column not always corrected to datetime #227 * 227 bug csv date (#233) * Testing python 3.7.3 via pip and *correctly* switch to Xenial linux (#227) * Checks if DF date column is a string and converts to datetime #227 * Oops. Test failing as date_column not always corrected to datetime #227 * csv dates come as strings. Type-check to see what's going on and conv… (#232) * moved things around a bit. type check after df creaation inside not read from pickle clause. If read from pickle, that should have been taken care of.. * csv dates come as strings. Type-check to see what's going on and conv… (#232) * moved things around a bit. type check after df creaation inside not read from pickle clause. If read from pickle, that should have been taken care of.. * Remove leading zero trimming (#235) (#239) * Remove leading zero trimming (#235) (#239) * added argument for embeddings threshold * added argument for embeddings threshold * resolves #250 (#251) * scipy==1.2.1 else breaks * new gensim breaks windows! Force 3.4.0 * resolves #250 (#251) * scipy==1.2.1 else breaks * new gensim breaks windows! Force 3.4.0 * filtering rows now gets rid of corresponding rows in df (#249) * filtering rows now gets rid of corresponding rows in df * gensim & scipy version limited due to introduced instability in current versions * filtering rows now gets rid of corresponding rows in df (#249) * filtering rows now gets rid of corresponding rows in df * gensim & scipy version limited due to introduced instability in current versions * Update pygrams.py Co-Authored-By: emily-tew <38726410+emily-tew@users.noreply.github.com> * Update pygrams.py Co-Authored-By: emily-tew <38726410+emily-tew@users.noreply.github.com> * 248 tfidf filter (#254) * Added prefilter of terms (#248) * 248 tfidf filter (#254) * Added prefilter of terms (#248) * del * del * Update README.md Missing `.` on `pip install -e .` * Corrected check for empty CPC list (#261) * cache 2 initial commit! (#269) * cache 2 initial commit! * fix-imports was calling the properties nd populating tfidf_mat. Disabled it. Plus some cosmetics * helper function to safeguard from None idf or tfidf * 257 add nmf code (#271) Added NMF output * resolves #272 (#275) * Dictionary used to store CPC rather than list inside data frame * 273 dates as ints (#277) * Dates now pickled as integer array to save space (#273) Tidied up date related utilities - added to date_utils from utils Renamed 'iso dates' to 'year_week' dates to avoid confusion with 'real' iso Column filter removed from DocumentsFilter Removed time and CPC document weighting * Update README.md * 279 small adjustments (#280) * Dates now pickled as integer array to save space (#273) * Tidied up date related utilities - added to date_utils from utils * Renamed 'iso dates' to 'year_week' dates to avoid confusion with 'real' iso * Column filter removed from DocumentsFilter * Removed time and CPC document weighting * Removed unused parameters and synchronised variable names (#273) * Added timing report and progress reports * 278 move mask (#283) * resolves #278 * Changed folders for cached outputs (#281) (#284) * 285 data uspto (#286) * error checks change... * resolves #286 * 287 update system requirements section (#288) * Updated System Performance section (System Requirements) * minor mods * Small bug (#289) * threshold not a list * save time series to file (#270) * Update README.md -it option was outdated * 291 bug (#292) resolves #291 * 294 fb (#295) * resolves #294 * 296 emtech facelift (#297) * resolves #296 * 298 nltk installation (#299) NLTK data now downloaded during execution of `pip install` (fixes #298) * 256 tech report 2 (#301) resolves # 256 * Ch comments (#304) * ch comments * Checking changes were propagated correctly #256 (#305) * Checking changes were propagated correctly #256 * Checking changes were propagated correctly - more missing #256 * Few american spellings caught #256 * Exponential emergence (#306) * add exponential emergence * #255 convert r scripts (#308) state space model resolves #308 #255 * General facelift * Refactoring for readability * Corrected issue with calculation of Porter (was using head not tail of dataset) * State space (#317) * cache state-space data! * two-stage grid search * Corrected test with duplicated args (good spot...) Now copes if min/max time series dates are not defined * If smoothing not requested, ensure None is returned for smoothed dictionary * Default predictor set now excludes LSTMs * 319 cache (#320) * #319 updated code and tests to reflect new cache usage * 321 test stopwords (#322) * #321 added stopwords to test folder for test specific variant * #319 consistent cmd line args, GloVe can now be placed anywhere * 315 clamp redo (#323) * #315 clamp smoothed values at 0 * cast smoothed data back to lists (from numpy arrays) for consistency * command line args now restricted to available smoothing and emergence * added simple test for holt-winters to confirm -ve values not handled * 326 mpq (#327) * mpq tweak and cached data * #328 added tests for example command line (#329) * #328 added tests for example command line * fixed: date not defined when not required causes failure * #328 corrected execution folder for README tests * Corrected merge * Whitespace changes ready for merge to master * Cleanup state space modelling * Whups. Now checks tests again and only runs on travis... and not win32 * Whups. Now checks tests again and only runs on travis... and not win32 * 324 state space predictions (#325) * #324 create table from state space results - work in progress * tests TBA * first commit * #324 create table from state space results - with tests * Trimmed SD not implemented * #324 Trimmed SD implemented * #324 report window size to HTML * #324 WIP - needs refinement, but works for non-test. Test may blow graph generation. * #328 multiplot added as option * Cleanup state space modelling * Whups. Now checks tests again and only runs on travis... and not win32 * Whups. Now checks tests again and only runs on travis... and not win32 * Merge issue with SSM

resolves #272

db40cf0

thanasions requested a review from IanGrimstead May 20, 2019 09:58

thanasions and others added 2 commits May 20, 2019 11:22

tests bug fixed

882141e

Merge branch 'develop' into 272_cpc_dict

82b837d

IanGrimstead approved these changes May 20, 2019

View reviewed changes

IanGrimstead merged commit 856789b into develop May 20, 2019

IanGrimstead deleted the 272_cpc_dict branch May 20, 2019 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

resolves #272 #275

resolves #272 #275

thanasions commented May 20, 2019

IanGrimstead left a comment

resolves #272 #275

resolves #272 #275

Conversation

thanasions commented May 20, 2019

IanGrimstead left a comment

Choose a reason for hiding this comment